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Abstract 

This work is intended as a contribution to a wavelet-based adaptive estimator of the memory parameter 
in the classical semi-parametric framework for Gaussian stationary processes. In particular we introduce 
and develop the choice of a data-driven optimal bandwidth. Moreover, we establish a central limit theorem 
for the estimator of the memory parameter with the minimax rate of convergence (up to a logarithm factor). 
The quality of the estimators are attested by simulations. 

1 Introduction 

Let X = (X t )t£i be a second-order zero-mean stationary process and its covariogram be defined 

r(t) = E(X ■ X t ), for t G Z. 
Assume the spectral density / of X, with 

exists and represents a continuous function on [—it, 0)[U]0, it]. Consequently, the spectral density of X should 
satisfy the asymptotic property, 

f(X)~C~ when A ^ 0, 

with D < 1 called the "memory parameter" and C > 0. If D E (0, 1), the process X is a so-called long-memory 
process, if not X is called a short memory process (see Doukhan et al, 2003, for more details). 



This paper deals with two semi-parametric frameworks which are: 



• Assumption Al: X is a zero mean stationary Gaussian process with spectral density satisfying 



f(\) = \\\- D .f*(\) for all A G [—it, 0)[U]0, tt], 



with /*(0) > and /* G H(D' , C D >) where < D' , < C D , and 




• Assumption Al': X is a zero-mean stationary Gaussian process with spectral density satisfying 

/(A) = |A|- D -r(A) for all A G [-it, 0)[U]0, tt], 
with /*(0) > and /* G W(-D', C D ') where < D' , C D < > and 



Remark 1 A great number of earlier works concerning the estimation of the long range parameter in a semi- 
parametric framework (see for instance Giraitis et al., 1997, 2000) are based on Assumption Al or equivalent 
assumption on f. Another expression (see Robinson, 1995, Moulines and Soulier, 2003 or Moulines et al., 
2007) is /(A) = \l-e lX \- 2d ■ f*(X) with f* a function such that |/*(A) - /*(0)| < /*(0) • A' 3 and < (3). It is 
obvious that for (3 < 2 such an assumption corresponds to Assumption Al with D' = f3. Moreover, following 
arguments developed in Giraitis et al., 1997, 2000, if f* G H(D' , Cd>) with D' > 2 is such that f* is s G N* 
times differ entiable around A = with /*( s ) satisfying a Lipschitzian condition of degree < I < 1 around 
0, then D' < s + £.So for our purpose, D' is a more pertinent parameter than s + £ (which is often used in 
no-parametric literature). Finally, the Assumption Al' is a necessary condition to study the following adaptive 
estimator of D. 

We have H'(D',Cd>) C H(D',C d >)- Fractional Gaussian noises (with D' = 2) and FARIMA[p,d,q] processes 
(with also D' = 2) represent the first and well known examples of processes satisfying Assumption Al' (and 
therefore Assumption Al). 

Remark 2 In Andrews and Sun (2004), an adaptive procedure covers a more general class of functions than 




H(D',C D ,), i.e. Has{D',C d/ ) defined by: 




g : [—it, tt] — > M + such that, as A — > 



, 9 (A) = .9(0) + £*=o C 2 'A 2 * + C D , \\\ D ' + o(\X\ D ') with 2k < D' < 2k + 2 
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Unfortunately, the adaptive wavelet based estimator defined below, as local or global log-periodogram estimators, 
is unable to be adapted to such a class (and therefore, when D' > 2, its convergence rate will be the same than 
if the spectral density is included in HasC^i^), at the contrary to Andrew and Sun estimator). 

This work is to provide a wavelet-based semi-parametric estimation of the parameter D. This method has 
been introduced by Flandrin (1989) and numerically developed by Abry et al. (1998, 2001) and Veitch et al. 
(2003). Asymptotic results are reported in Bardet et al. (2000) and more recently in Moulines et al. (2007). 
Taking into account these papers, two points of our work can be highlighted : first, a central limit theorem 
based on conditions which are weaker than those in Bardet et al. (2000). Secondly, we define an auto-driven 
estimator D n of D (its definition being different than in Veitch et at, 2003). This results in a central limit 
theorem followed by D„ and this estimator is proved rate optimal up to a logarithm factor (see below). Below 
we shall develop this point. 

Define the usual Sobolev space W([3, L) for (3 > and L > 0, 

W([3, L) = J 5 (A) = 5>e 2 ^ A G L 2 ([0, 1]) / £(1 + \i\f\g,\ < oo 
Let ip be a "mother" wavelet satisfying the following assumption: 

Assumption W(oo) : ip : Mul with [0, l\-support and such that 

1. ip is included in the Sobolev class W(oo,L) with L > 0; 

2. [ ip(t) dt = and ip(0) = ip(l) = 0. 
Jo 

A consequence of the first point of this Assumption is: for all p > 0, sup AgR |^(A)|(1 + |A|) P < oo, where 
ijj(u) = f Q ip(t) e~ lut dt is the Fourier transform of ip. A useful consequence of the second point is ip(u) ~ C u 
for u — * with \C\ < oo a real number not depending on u. 

The function -0 is a smooth compactly supported function (the interval [0, 1] is meant for better readabil- 
ity, but the following results can be extended to another interval) with its m first vanishing moments. If 
D' < 2 and < D < 1 in Assumptions Al, Assumption W(oo) can be replaced by a weaker assumption: 

Assumption W(5/2) : ip : R i— > M with [0, V\-support and such that 



and 



En 5 



< L 
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1. ip is included in the Sobolev class W(5/2, L) with L > 0; 



2. / %j)(t) dt = and tp(0) = ip{l) = 



o 



Remark 3 TTie choice of a wavelet satisfying Assumption W(oo) is quite restricted because of the required 
smoothness ofip. For instance, the function ip(t) = {t 2 — t + a) exp(— l/t(l — <)) and a ~ 0.23087577 satisfies 
Assumption W(oo). Tfte cZass of "wavelet" checking Assumption W(5/2) is larger. For instance, tp can be 
a dilated Daubechies "mother" wavelet of order d with d > 6 to ensure the smoothness of the function ip.lt 
is also possible to apply the following theory to "essentially" compactly supported "mother" wavelet like the 
Lemarie-Meyer wavelet. Note that it is not necessary to choose ijj being a "mother" wavelet associated to a 
multi-resolution analysis o/L 2 (R) as in the recent paper of Moulines et al. (2007). The whole theory can be 
developed without this assumption, in which case the choice of ip is larger. 

If Y = (Y t )t£R is a continuous-time process for (a, b) <S x R, the "classical" wavelet coefficient d(a, b) of the 
process Y for the scale a and the shift b is 



However, this formula (fT]) of a wavelet coefficient cannot be computed from a time series. The support of 
ip being [0,1], let us take the following approximation of formula {TJ) and define the wavelet coefficients of 



for (a, b) € N!j_ x Z. Note that this approximation is the same as the wavelet coefficient computed from Mallat 
algorithm for an orthogonal discrete wavelet basis (for instance with Daubechies mother wavelet). 

Remark 4 Here a continuous wavelet transform is considered. The discrete wavelet transform where a = 2 J , 
in other words numerically very interesting (using Mallat cascade algorithm) is just a particular case. The main 
point in studing a continuous transform is to offer a larger number of "scales" for computing the data-driven 
optimal bandwidth (see below). 

Under Assumption Al, for all 6eZ, the asymptotic behavior of the variance of e(a, b) is a power law in scale 
a (when a — > oo). Indeed, for all a 6 N*, (e(a, &))t>gz is a Gaussian stationary process and (see Section more 
details in [5]): 




(1) 



X = (X t ) tez by 




(2) 



E(e 2 (a, 0)) ~ K(i/i,d) • 0° when a — > oo, 



(3) 
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with a constant Kr^^ such that, 

/oo 
$(u)\ 2 ■ \u\- a du > for all a < 1, (4) 
-oo 

where tp is the Fourier transform of i/j (the existence of K(,p, a ) is established in Section [S]). Note that is 
also checked without the Gaussian hypothesis in Assumption Al (the existence of the second moment order 
of X is sufficient). 

The principle of the wavelet-based estimation of D is linked to this power law aP . Indeed, let (Xi, . . . , Xn) 
be a sampled path of X and define T\r(a) a sample variance of e(a, .) obtained from an appropriate choice of 
shifts &, i.e. 

1 [N/a] 

f N(a) = — 5> a (a,*-l). (5) 

Then, when a = — ► oo satisfies limjv-»ooa./v ■ N^ 1 ^ 20 +1 ' = oo, a central limit theorem for log(Tjv(ajv)) 
can be proved. More precisely we get 

log(T N (a N )) =D\og(a N ) + log(/* (0)^ )D) ) + • £jv , 

with £jv — — > A/"(0, (7?0 D j) and crj^ D ^ > 0. As a consequence, using different scales (riajv, ■ ■ ■ , f^aisr)) where 

(ri,...,n) G (N*) £ with aw a "large enough" scale, a linear regression of (log(Tjv(j"ia.iv))i by (log(rjOjv))i 
provides an estimator D{cln) which satisfies at the same time a central limit theorem with a convergence rate 

A_ 

a N ' 

But the main problem is : how to select a large enough scale considering that the smaller a at, the faster 
the convergence rate of D(o,n)- An optimal solution would be to chose larger but closer to N 1 ^ 20 
but the parameter D' is supposed to be unknown. In Veitch et al. (2003), an automatic selection procedure 
is proposed using a chi-squared goodness of fit statistic. This procedure is applied successfully on a large 
number of numerical examples without any theoretical proofs however. Our present method is close to the 
latter. Roughly speaking, the "optimal" choice of scale (a at) is based on the "best" linear regression among 
all the possible linear regressions of I consecutive points (a, log(T\r(a))), where t is a fixed integer number. 
Formally speaking, a contrast is minimized and the chosen scale ajy satisfies: 

log(«jy) v 1 
log TV 2D' + l 

Thus, the adaptive estimator Dn of D for this scale a at is such that : 



— {D N -D) Af(0,a 2 D ), 
a N 



JV^oo 



with erf, > 0. Consequently, the minimax rate of convergence N D /( 1+2D ), up to a logarithm factor, for the 
estimation of the long memory parameter D in this semi-parametric setting (see Giraitis et at, 1997) is given 
by D N . 

Such a rate of convergence can also be obtained by other adaptive estimators (for more details see below). 
However, Dn has several "theoretic" advantages: firstly, it can be applied to all D < — 1 and D' > (which are 
very general conditions covering long and short memory, in fact larger conditions than those usually required 
for adaptive log-periodogram or local Whittle estimators) with a nearly optimal convergence rate. Secondly, 
Dn satisfies a central limit theorem and sharp confidence intervals for D can be computed (in such a case, the 
asymptotic a 2 D is replaced by a 1 ^ , for more details see below). Finally, under additive assumptions on ip 
is supposed to have its first m vanishing moments), Dn can also be applied to a process with a polynomial 
trend of degree < m — 1. 

We then give a several simulations in order to appreciate empirical properties of the adaptive estimator Dn. 
First, using a benchmark composed of 5 different "test" processes satisfying Assumption Al' (see below), the 
central limit theorem satisfied by Dn is empirically checked. The empirical choice of the parameter I is also 
studied. Moreover, the robustness of Dn is successfully tested. Finally, the adaptive wavelet-based estimator 
is compared with several existing adaptive estimators of the memory parameter from generated paths of the 5 
different "test" processes (Giraitis- Robinson- Samarov adaptive local log-periodogram, Moulines-Soulier adap- 
tive global log-periodogram, Robinson local Whittle, Abry-Taqqu-Veitch data-driven wavelet based, Bhansali- 
Giraitis-Kokoszka FAR estimators). The simulations results of Dn are convincing. The convergence rate of 
Dn is often ranges among the best of the 5 test processes (however the Robinson local Whittle estimator Dr 
provides more uniformly accurate estimations of D). Three other numerical advantages are offered by the 
adaptive wavelet-based estimator (and not by Dg). Firstly, it is a very low consuming time estimator. Sec- 
ondly it is a very robust estimator: it is not sensitive to possible polynomial trends and seems to be consistent 
in non-Gaussian cases. Finally, the graph of the log-log regression of sample variance of wavelet coefficients is 
meaningful and may lead us to model data with more general processes like locally fractional Gaussian noise 
(see Bardet and Bertrand, 2007). 

The central limit theorem for sample variance of wavelet coefficient is subject of section[2]Section[3]is concerned 
with the automatic selection of the scale as well as the asymptotic behavior of Dn- Finally simulations are 



6 



given in section [4] and proofs in section [5] 



2 A central limit theorem for the sample variance of wavelet coef- 
ficients 

The following asymptotic behavior of the variance of wavelet coefficients is the basis of all further developments. 
The first point that explains all that follows is the 

Property 1 Under Assumption Al and Assumption W{oo), for a G N*, (e(a,b))b£i is a zero mean Gaussian 
stationary process and it exists M > not depending on a such that, for all a € N* , 

E(e 2 (a,0)) - f*(0)K { ^ D) ■ a D \ < M ■ a D ' D ' . (6) 

Please see Section[5]for the proofs. The paper of Moulines et al. (2007) gives similar results for multi-resolution 
wavelet analysis. The special case of long memory process can also be studied with weaker Assumption VF(5/2), 



Property 2 Under Assumption W(5/2) and Assumption Al with < D < 1 and < D' < 2, for a e N*, 
(e(a, 6))fjgz is a zero mean Gaussian stationary process and holds. 

Two corollaries can be added to both those properties. First, under Assumption Al' a more precise result can 
be established. 

Corollary 1 Under: 

• Assumption Al' and Assumption W{oo); 

• or Assumption Al ' with < D < 1, < D' < 2 and Assumption 1^(5/2); 
then (e(a,6))b e z is a zero mean Gaussian stationary process and 

E(e 2 (a,0)) = /*(0)(a' WiD) ■ a D + C di K {4 ,^ di) ■ a D ~ D '^ +o(a D ~ D ') when a^oo. (7) 

This corollary is key point for the estimation of an appropriated sequence of scale a = (a at). Indeed, when 
/* 6 H'(D',C DI ), then /* g H{D" \C D ») for all D" satisfying < D" < D' . Therefore, Assumption Al' 
is required for obtaining the optimal choice of a^v, «-e. an — N 1 ^ 20 +1 ) (see below for more details). The 
following corollary generalizes the above Properties [1] and [2j 

Corollary 2 Properties^ and& are also checked when the Gaussian hypothesis of X is replaced byKX^ < oo 
for all k S Z. 
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Remark 5 In this paper, the Gaussian hypothesis has been taken into account merely to insure the conver- 
gence of the sample variance {5jj of wavelet coefficients following a central limit theorem (see below). Such a 
convergence can also be obtained for more general processes using a different proof of the central limit theorem, 
for instance for linear processes (see a forthcoming work). 

As mentioned in the introduction, this property allows an estimation of D from a log-log regression, as soon 
as a consistant estimator of E(e 2 (a,0)) is provided from a sample (Xi, . . . , Ajy) of the time series X. Define 
then the normalized wavelet coefficient such that 

e(a, b) = — for nef and b G Z. (8) 

(f*{V)K{4,,D)-a D ) 1/2 

From property [TJ it is obvious that under Assumptions Al it exists M' > satisfying for all a G N*, 

E(e 2 (a,0)) - 1 



< m> • -4- 



To use this formula to estimate D by a log-log regression, an estimator of the variance of e(a, 0) should be 
considered (let us remember that a sample (X\, . . . ,Xn) of is supposed to be known, but parameters (-D, 
D' , Cd>) are unknown). Consider the sample variance and the normalized sample variance of the wavelet 
coefficient, for 1 < a < N, 

x If] j If] 

f N (a) = - E -^2e 2 (a,k~l) and f N (a) = ]T e 2 (a, k - 1). (9) 

L a J k=l \- a J k=l 

The following proposition specifies a central limit theorem satisfied by log TV (a), which provides the first step 
for obtaining the asymptotic properties of the estimator by log-log regression. More generally, the following 
multidimensional central limit theorem for a vector (logTjv(aj))i can be established. 



Proposition 1 Define £ 6 N\ {0,1} and (r\,- • • ,rg) £ (N*) . Let (a n ) ne jq be such that N/on 
on ■ A~ 1 / < - 1+2D ) — > oo. Under Assumption A 1 and Assumption W(oo), 



oo anc 



N- 




(logf N (r iaN )) Af t (0; r(n,--- ,r e ,i/;,D)), (10) 

V / Ki<£ AT— too 



with r(n,- • • ,ri,ip,D) = (7y)i<;.j<£ the covariance matrix such that 

7ij ^ L Jj , E (/ , £> j; cob(u dtfmjdu) ■ (11) 

The same result under weaker assumptions on ip can be also established when A is a long memory process. 
Proposition 2 Define I G N \ {0, 1} and (ri, ■ • • , rg) € (N*)^. ie£ (a n ) n gN &e si/cft i/iai N/on — > oo and 

TV— «x) 

a w • 7V _1 /( 1+2£I ') — ► oo. Under Assumption W(5/2) and Assumption Al with D G (0, 1) and £>' G (0,2), 
tfie CLT ED «oZds. 



These results can be easily generalized for processes with polynomial trends if tp is considered having its first 
m vanishing moments, i.e, 

Corollary 3 Given the same hypothesis as in Proposition [7] or fj| and if ip is such that m £ N \ {0, 1} is 
satisfying, J t^t) dt = for all p e {0, 1, . . . , m - 1} the CLT W also holds for any process X' = (X^ 
such that for all t G Z, EX^ = P m (t) with P m (t) = ao + ait + • ■ • + a r „_it" 1_1 is a polynomial function and 
(ai)o<i<m-i are real numbers. 

3 Adaptive estimator of memory parameter using data driven op- 
timal scales 



The CLT {TDJl implies the following CLT for the vector (logT N (ria N ))i, 



— (\ogf N ( n a N ) - Dlog(r. iaN ) -log(f*(0)K^ D) )) A^(0 ; T(r u ■ ■ ■ , r t> ip, D)). 

O/v V / Ki<e N—kx 



and therefore, 



logT N (na N )) 1<i<e = A 



N 



D 



K 



x/N/a N 



with A 



N 



log(riajv) 



A 



' l )\<i<V 



K = -log(/*(0) • K { ^ D) ) and (^) 1<i< , JV/i(0 ; T(n, ■ ■ ■ , r it D)) 



N— >oo 



, log(r e a N ) 1 j 



I D(a N ) v 

Therefore, a log-log regression of (T/v(j"iajv)) 1<i<f on scales ( r i a Jv) 1<i<£ provides an estimator ( J 

K(a N ) 



D 

of ( ) such that 
K 



D(a N 



( ^ )={A' N - A N )-' ■ A' N ■ Y^>-^) with Y&>~ri = (logfWCnaiv))!^, 



(12) 



which satisfies the following CLT, 



Proposition 3 Under the Assumptions of the Proposition^ 



jV // D(a N ) 



K{a N ) 



D 
K 



— (( ^ )-( )) ^M(0; (4' ■ • A' • r(n, • ■ • ,r^,D) ■ i. (A' • (13) 
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I loRfn) l \ 



log(n) 

with A = ■ and T(r\, • ■ ■ ,ri,ip, D) given by ill}) - 

\ log(r^) 1 J 

Moreover, under Assumption Al' and if D G (—1,1), D(on) is a semi-parametric estimator of D and its 
asymptotic mean square error can be minimized with an appropriate scales sequence (ajv) reaching the well- 
known minimax rate of convergence for memory parameter D in this semi-parametric setting (see for instance 
Giraitis et al, 1997 and 2000). Indeed, 

Proposition 4 Let X satisfy Assumption Al' with D G (— 1, 1) and ip the assumption W(oo). Let (ajv) be a 
sequence such that a» = [7V 1 /( 1+2 ' D Then, the estimator D(ajq) is rate optimal in the minimax sense, i.e. 

limsup sup sup N^Ssr . E[D(a N ) - D) 2 } < +oo. 

N->oo D£(-l,l) f*£H(D',C n ,) 

Remark 6 As far as we know, there are no theoretic results of optimality in case of D < —1, but according 
to the usual following non-parametric theory, such minimax results can also be obtained. Moreover, in case 
of long-memory processes (if D £ (0, 1) ), under Assumption Al ' for X and Assumption W(5/2) for ij), the 
estimator D (a n) is also rate optimal in the minimax sense. 

In the previous Propositions [1] and the rate of convergence of scale a^v obeys to the following condition, 

— — > oo and Nl/ °" +2DI) — > oo with £>'e(0,oo). 

Now, for better readability, take on = N a - Then, the above condition goes as follow: 

a N = N a with a* < a < 1 and a* = . (14) 

Thus an optimal choice (leading to a faster convergence rate of the estimator) is obtained for a — a* + e 
with e — > 0+. But a* depends on D' which is unknown. To solve this problem, Veitch et al. (2003) sug- 
gest a chi-square-based test (constructed from a distance between the regression line and the different points 
(log Tjv(riajv), log^a/v)). It seems to be an efficient and interesting numerical way to estimate D, but with- 
out theoretical proofs (contrary to global or local log-periodogram procedures which are proved to reach the 
minimax convergence rate, see for instance Moulines and Soulier, 2003). 



We suggest a new procedure for the data-driven selection of optimal scales, i.e. optimal a. Let us con- 
sider an important parameter, the number of considered scales I € N\ {0, 1, 2} and set (r\, . . . ,ri) = (1, . . . , £). 
For a e (0, 1), define also 
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• the vector Y N {a) = (logTjv(i • N"))^.^ 

' log(jV a ) 1 

• the matrix An (a) = : : ; 

, log(^ • N a ) 1 j 

• the contrast, Qn(cl, D, K) = (Y N (a) - A N (a) ■ ( )) • (Y N (a) - A N (a) ■ ( )). 

if J\ 

Qn(ch, D, K) corresponds to a squared distance between the £ points ( log(i • N a ) , logT/v(i • N a )) i and a line. 
The point is to minimize this contrast for these three parameters. It is obvious that for a fixed a £ (0, 1) Q is 
minimized from the previous least square regression and therefore, 

Q N (a N ,D(a N ),K(a N )) = min Q N (a,D,K). 

a£(0,l),D<l,K£R 

with (Z?(ajv), K(oln)) obtained as in relation (|12[) . However, since 3jv has to be obtained from numerical 
computations, the interval (0, 1) can be discretized as follows, 

2 3 logjJV/l]l 



a N £ 



A N = {: 



logTV' logTV'"'' logiV J ' 
Hence, if a £ „4jv ; it exists fc £ {2, 3, . . . , log[A^/£]} such that k = a ■ logiV. 

Remark 7 This choice of discretization is implied by the following proof of the consistency ofajy. If the 
interval (0, 1) is stepped in N 13 points, with (3 > 0, the used proof cannot attest this consistency. Finally, 
it is the same framework as the usual discrete wavelet transform (see for instance Veitch et ai, 2003) but 
less restricted since logiV may be replaced in the previous expression of An by any negligible function of N 
compared to functions with /3 > (for instance, {\ogN) d or d logiV can be used). 

Consequently, take 

Qn{o) = Q N {a,D{a N ) 1 K(a N ))\ 
then, minimize Qn for variables (a, D, K) is equivalent to minimize Qn for variable a £ An, that is 

Qn(oin) = min Qn(oi). 

cc£An 

From this central limit theorem derives 

Proposition 5 Let X satisfy Assumption Al' and ip Assumption W (oo) (or As sumption W (5/ 2) ifO < D < 1 
and < D' < 2). Then, 

\oga N v * 1 

OL N = ~, — ► OL 



logiV N ^ 1 + 2D> 
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This proves also the consistency of an estimator D'n of the parameter D', 
Corollary 4 Taking the hypothesis of Proposition [5l we have 

D> N = 1 -^ D>. 

The estimator Sjy defines the selected scale a at such that aw = N aN . From a straightforward application of 
the proof of Proposition [5] (see the details in the proof of Theorem [T]), the asymptotic behavior of on can be 
specified, that is, 

( N a * - \ v 

Pr —j < N aN < N a -(logAT)^) 1, (15) 



(logiV) A 



for all positive real numbers A and fj, such that A > rg^fvjy an< ^ ^ ^ 7^2 1 Consequently, the selected scale is 
asymptotically equal to N a up to a logarithm factor. 



Finally, Proposition [5] can be used to define an adaptive estimator of D. First, define the straightforward 
estimator 

D N = D(a N ), 

which should minimize the mean square error using on- However, the estimator Dn does not attest a CLT 



since Pi(o.n < a*) > and therefore it can not be asserted that E(\/ N/cin(Dn — D)) = 0. To establish a CLT 
satisfied by an adaptive estimator Dm of D, an adaptive scale sequence (Sjv) = (N° N ) has to be defined to 
ensure Pr(ajv < &*) — > 0. The following theorem provides the asymptotic behavior of such an estimator, 

AT— >oo 

Theorem 1 Let X satisfy Assumption AV and ip Assumption W(oo) (or Assumption W(5/2) if < D < 1 
and < D' < 2). Define, 

3 log log N 



& N = a N + — f 6 , a N = N aN = N aN ■ (\ogN)^-w'N and D N = D(a N ). 

{i-2)D' N logA^ K ' 

Then, with a% = (10)- (A' ■ A)" 1 ■ A' ■ T(l, ■ • ■ , t, ip, D) ■ A ■ {A 1 ■ A)- 1 ■ (1 0)', 



{D N -D)-^m-,° 2 n) and Vp > ^±^, . \D N - D\ ^ 0. (16) 



N & « v ' n^oo v ' u ' r (£-2)D'' (logN)P 



Remark 8 Both the adaptive estimators Dn and Dn converge to D with a rate of convergence rate equal 

D 1 

to the minimax rate of convergence N 1 + 2r >' up to a logarithm factor (this result being classical within this 
semi-parametric framework) . Unfortunately, our method cannot prove that the mean square error of both these 
estimators reaches the optimal rate and therefore to be oracles. 

To conclude this theoretic approach, the main properties satisfied by the estimators Djv and Dn can be 
summarized as follows: 
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1. Both the adaptive estimators Dn and Djy converge at D with a rate of convergence rate equal to the 
minimax rate of convergence N 1 + 2D ' up to a logarithm factor for all D < —1 and D > (this being 
very general conditions covering long and short memory, even larger than usual conditions required for 
adaptive log-pcriodogram or local Whittle estimators) whith X considered a Gaussian process. 

2. The estimator Dn satisfies the CLT (| and therefore sharp confidence intervals for D can be computed 
(in which case, the asymptotic matrix T(l, . . . ,£,ip,D) is replaced by T(l, . . . , £, ip, -Djv))- This is not 
applicable to an adaptive log-periodogram or local Whittle estimators. 

3. The main Property [T] is also satisfied without the Gaussian hypothesis. Therefore, adaptive estimators 
-Djv and -Djv can also be interesting estimators of D for non-Gaussian processes like linear or more general 
processes (but a CLT similar to Theorem [T] has to be established...). 

4. Under additive assumptions on ip (ip is supposed to have its first m vanishing moments), both estimators 
Djv and Dn can also be used for a process X with a polynomial trend of degree < m — 1, which again 
cannot be yielded with an adaptive log-pcriodogram or local Whittle estimators. 

4 Simulations 

The adaptive wavelet basis estimators Dn and Dn are new estimators of the memory parameter D in the 
semi-parametric frame. Different estimators of this kind are also reported in other research works to have 
proved optimal. In this paper, some theoretic advantages of adaptive wavelet basis estimators have been high- 
lighted. But what about concrete procedure and results of such estimators applied to an observed sample? 
The following simulations will help to answer this question. 

First, the properties (consistency, robustness, choice of the parameter I and mother wavelet function ip) of 
Dn and Djy are investigated. Secondly, in cases of Gaussian long-memory processes (with D € (0, 1) and 
D' < 2), the simulation results of the estimator Dn are compared to those obtained with the best known 
semi-parametric long-memory estimators. 

To begin with, the simulations conditions have to be specified. The results are obtained from 100 gener- 
ated independent samples of each process belonging to the following " benchmark" . The concrete procedures 
of generation of these processes are obtained from the circulant matrix method, as detailed in Doukhan et al. 
(2003). The simulations are realized for different values of D, N and processes which satisfy Assumption Al' 
and therefore Assumption Al (the article of Moulines et at, 2007, gives a lot of details on this point): 

13 



1. the fractional Gaussian noise (fGn) of parameter H = (D + l)/2 (for — 1 < D < 1) and a 2 = 1. The 
spectral density //e?„ of a fGn is such that ffQ n is included in H(2, C-i) (thus D' = 2); 

2. the FARIMA[p,d,q] process with parameter d such that d = D/2 e (-0.5,0.5) (therefore —1<D< 1), 
the innovation variance a 2 satisfying a 2 = 1 and p, q G N. The spectral density /farima of such a 
process is such that Jfarima 1S included in the set Tt(2, C-i) (thus D' = 2); 

3. the Gaussian stationary process X^ D ' D \ such that its spectral density is 

hW = ^n(l + * D ') for A G [— 7r, 7r], (17) 
with L> G (-00, 1) and D' G (0,oo). Therefore / 3 * = 1 + X D ' G H(D',1) with £)' G (0,oo). 
In the long memory frame, a "benchmark" of processes is considered for D = 0.1, 0.3, 0.5, 0.7, 0.9: 

• fGn processes with parameters H — (D + l)/2 and a 2 = 1; 

• FARIMA[0,d,0] processes with d = D/2 and standard Gaussian innovations; 

• FARIMA[l,d,0] processes with d = D/2, standard Gaussian innovations and AR coefficient <fi — 0.95; 

• FARIMA[l,d,l] processes with d = D/2, standard Gaussian innovations and AR coefficient = —0.3 
and MA coefficient <fi = 0.7; 

• X( D ' D > Gaussian processes with D' = 1. 

4.1 Properties of adaptive wavelet basis estimators from simulations 

Below, we give the different properties of the adaptive wavelet based method. 

Choice of the mother wavelet i/;: For short memory processes (D < 0), let the wavelet ipsM be such 
that ipSM(t) — (t 2 — t + a)exp(— 1/f (1 — t)) with a ~ 0.23087577. It satisfies Assumption I^(oo). Lemaric- 
Meyer wavelets can be also investigated but this will lead to quite different theoretic studies since its support 
is not bounded (but "essentially" compact). 

For long memory processes (0 < D < 1), let the mother wavelet ipLM be such that ipLAi(t) = 100 • t 2 (t — 
l) 2 (t 2 — t + 3/14)Io<t<i which satisfies Assumption W(5/2). Note that Daubechies mother wavelet or ipsM 
lead to "similar" results (but not as good). 
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Choice of the parameter I: This parameter is very important to estimate the "beginning" of the lin- 
ear part of the graph drawn by points (log(dj), log T(aj))j. On the one hand, if £ is a too small a number 
(for instance I = 3), another small linear part of this graph (even before the "true" beginning N a ) may be 
chosen; consequently, the v 7 MSE (square root of the mean square error) of and therefore of Djy or 
will be too large. On the other hand, if £ is a too large a number (for instance I = 50 for N = 1000), the 
estimator 2jv will certainly satisfy Sn < a* since it will not be possible to consider £ different scales larger 
than N a (if D' = 1 therefore a' = 1/3, then ajy has to satisfy: N/(50o,n) = is a large number and 

(clm > TV 1 / 3 = 10; this is not really possible). Moreover, it is possible that a "good" choice of £ depends on 
the "flatness" of the spectral density /, i.e. on D'. We have proceeded to simulations for each different values 
of £ (and N and D). Only V MSE of estimators arc presented. The results are specified in Table 1. 

In Table 1, two phenomena can be distinguished: the detection of a* and the estimation of D: 

• To estimate a* , I has to be small enough, especially because of " D' close to 0" and so " a' close to 1" is 
possible. However, our simulations indicate that £ must not be too small (for instance £ = 5 leads to an 
important MSE for 3/v implying an important MSE for -Djv) and seems to be independent of N (cases 
N = 1000 and N = 10000 are quite similar). Hence, our choice is l\ = 15 to estimate a* for any N. 

• To estimate D, once a* is estimated, a second value li of £ can be chosen. We use an adaptive procedure 
which, roughly speaking, consists in determining the "end" of the acceptable linear zone. Firstly, we 
use again the same procedure than for estimating S/v but with scales (ajv /i)i<i<i 1 and l\ = 15. It 
provides an estimator corresponding to the maximum of acceptable (for a linear regression) scales. 
Secondly, the adaptive number of scales £2 is computed from the formula £2 = £ = [fr/v/ajv]. 
The simulations carried out with such values of £\ and £2 are detailed in Table 1 . 

As it may be seen in Table 1, the choice of parameters {£\ = 15, £2 = £) provides the best results for estimating 
D, almost uniformly for all processes. 

Consistency of the estimators Sjy and 6tN- the previous numerical results (here we consider £\ = 15) 
show that cxn and oln converge (very slowly) to the optimal rate a* , that is 0.2 for the first four processes and 
1/3 for the fifth. Figure 1 illustrates the evolution with N of the log- log plotting and the choice of the onset 
of scaling. 

Figure 1 shows that logTjv(i • N a ) is not a linear function of the logarithm of the scales log(i • N a ) when N 
increases and a < a* (a consequence of Property Q] it means there is a bias). Moreover, if a > a* and a 

15 




Figure 1: Log-log graphs for different samples of X^ D ' D ' with D = 0.5 and D' = 1 when TV = 10 3 (up and 
left, L>n — 1.04/, TV = 10 4 (up and right, D N ~ 0.66), TV = 10 5 (down and left, D N ~ 0.62,) and TV = 10 6 
(down and right, Dn ~ 0.54J. 

increases, a linear model appears with an increasing error variance. 

Consistency and distribution of the estimators Dn and Dn- The results of Table 1 show the consis- 
tency with TV of Dn and Dn only by using l\ = 15. Figure 2 provides the histograms of Dn and Dn for 100 
independent samples of FARIMA(1, d, 1) processes with D = 0.5 and TV = 10 5 . Both the histograms of Figure 
2 are similar to Gaussian distribution histograms. It is not surprising for Dn since Theorem [T] shows that the 
asymptotic distribution of Dn is a Gaussian distribution with mean equal to D. The asymptotic distribution 
of Dn and the Gaussian distribution seem also to be similar. A Cramer- von Mises test of normality indicates 
that both distributions of Dn and Dn can be considered a Gaussian distribution (respectively W ~ 0.07, 
p - value ~ 0.24 and W ~ 0.05, p - value ~ 0.54). 
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Figure 2: Histograms of D N and D N for 100 samples of FARIMA(l,d, 1) with D = 0.5 for N = 10 5 . 

Consistency in case of short memory: The following Table 2 provides the behavior of Dn and Dn 
if D < and D' > 0. Two processes are considered in such a frame: a FARIMA(0, d, 0) process with 
—0.5 < d < and therefore — 1 < D < (always with D' = 2) and a process 

and L> < and D' > 0. 

The results are displayed in Tabic O (here N = 1000, N = 10000 and N = 100000, 4 = 15 and £ 2 = [5 7V 01 ]) 
for different choices of D and D' . Thus it appears that Dn and i^Ar can be successively applied to short 
memory processes as well. Moreover, the larger D' , the faster their convergence rates. 

Robustness of D^, D^: To conclude with the numerical properties of the estimators, four different processes 
not satisfying Assumption AY are considered: 

• a FARIMA(0, d, 0) process (denoted PI) with innovations satisfying a uniform law (and EA 2 2 < oo); 

• a FARIMA(0, d, 0) process (denoted P2) with innovations satisfying a distribution with density w.r.t. 
Lebesgue measure f{x) = 3/4 * (1 + |x|)" 5 / 2 for x £ R (and therefore E|Ai| 2 = oo but E|JQ| < oo); 

• a FARIMA(0, d, 0) process (denoted P3) with innovations satisfying a Cauchy distribution (and E|X^| = 
oo); 

• a Gaussian stationary process (denoted P4) with a spectral density /(A) = (|A| — 7r/2) -1 / 2 for all 
A e [— tt.tt] \ {-7r/2,7r/2}. The local behavior of / in is /(|A|) - y/n/2\X\ D with D = 0, but the 
smoothness condition for / in Assumption Al is not satisfied. 

For the first 3 processes, D is varies in {0.1,0.3,0.5,0.7,0.9} and 100 independent replications are taken into 
account. The results of these simulations are given in Table 3. 
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As outlined in the theoretical part of this paper, the estimators Dm and Djy seem also to be accurate for 
L 2 -linear processes. For L"-lincar processes with 1 < a < 2, they are also convergent with a slower rate of 
convergence. Despite the spectral density of process P4 does not satisfies the smoothness hypothesis requires 
in Assumptions Al or Al', the convergence rates of Dn and Dn are still convincing. These results confirm 
the robustness of wavelet based estimators. 

4.2 Comparisons with other semi-parametric long-memory parameter estimators 
from simulations 

Here we consider only long-memory Gaussian processes (D G (0, 1)) based on the usual hypothesis < D' < 2. 
More precisely, the "benchmark" is: 100 generated independent samples of each process with length N = 10' ! 
and N = 10 4 and different values of D, D = 0.1, 0.3, 0.5, 0.7, 0.9. Several different semi-parametric estimators 
of D arc considered: 

• Dbgk is an "optimal" parametric Whittle estimator obtained from a BIC criterium model selection 
of fractionally differenced autoregressive models (introduced by Bhansali it et al., 2006). The required 
confidence interval of the estimation Dbgk is [Dr — 2/iV 1 / 4 , Dr — 2/7V 1 / 4 ]; 

• Dqrs is an adaptive local pcriodogram estimator introduced by Giraitis et al (2000). It requires two 
parameters: a bandwidth parameter to, with a procedure of determination provided in this article, and 
a number of low trimmed frequencies / (satisfying different conditions but without being fixed in this 
paper; after a number of simulations, I = max(ra 1 '' 3 , 10) is chosen); 

• Dms is an adaptive global periodogram estimator introduced by Moulines and Soulier (1998, 2003), also 
called FEXP estimator, with bias- variance balance parameter k = 2; 

• Dr is a local Whittle estimator introduced by Robinson (1995). The trimming parameter is to = iV/30; 

• Datv is an adaptive wavelet based estimator introduced by Veitch et al. (2003) using a Db4 wavelet 
(and described above); 

• D N defined previously with l x = 15 and £ 2 = N^ Sn /10 and a mother wavelet = 100 ■ t 2 (t - l) 2 (t 2 - 
t + 3/14)I < t <i satisfying assumption W(5/2). 

Softwares (using Matlab language) for computing some of these estimators are available on Internet (see the 
website of D. Veitch http://wwww.cubinlab.ee.mu.oz.au/~darryl/ for Datv and the homepage of E. 



Moulines http://www.tsi.enst.fr/~moulines/ for Dms and Dr). The other softwares are available on 
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Simulation results are reported in Table 4. 



Comments on the results of Table 4: These simulations allow to distinguish four "clusters" of estimators. 

• Dbgk is obtained from a BIC-criterium hierarchical model selection (from 2 to 11 parameters, cor- 
responding to the length of the approximation of the Fourier expansion of the spectral density) using 
Whittle estimation. For these simulations, the BIC criterion is generally minimal for 5 to 7 parameters 
to be estimated. Simulation results are not very satisfactory except for D = 0.1 (close to the short 
memory). Moreover, this procedure is rather time-consuming. 

• Dqrs offers good results for fGn and FARIMA(0, d, 0). However, this estimator does not converge fast 
enough for the other processes. 

• Estimators Dms and Dr have similar properties. They (especially Dr) are very interesting because 
they offer the same fairly good rates of convergence for all processes of the benchmark. 

• Being built on similar principles, estimators Datv and -Djv have similar behavior as well. Their con- 
vergence rates are the fastest for fGn and FARIMA(0, d, 0) and arc almost close to fast ones for the 
other processes. Their times of computing, especially for Datv for which the computations of wavelet 
coefficients with that the Mallat algorithm, are the shortest. 

Conclusion: Which estimator among those studied above has to be chosen in a practical frame, i.e. an 
observed time series? We propose the following procedure for estimating an eventual long memory parameter: 

1. Firstly, since this procedure is very low time consuming and applicable to processes with smooth trends, 
draw the log-log regression of wavelet coefficients' variances onto scales. If a linear zone appears in this 
graph, consider the estimator Djq (or Datv) 01 D. 

2. If a linear zone appears in the previous graph and if the observed time series seems to be without a 
trend, compute Dr. 

3. Compare both the estimated value of D from confidence intervals (available for D^ or Datv and Dr). 

5 Proofs 

Proof [Property[T] The arguments of this proof arc similar to those of Abry et al. (1998) or Moulincs et al. 
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with C,g,L > only depending on /3 and L (see for instance Devore and Lorentz, 1993). Therefore if ip satisfies 
Assumption W(oo) and X Assumption Al, for all (3 > 1/2, since sup ugR |V , ( U )I < °°7 
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since sup ugR (l + u n )\ip(u)\ < oo for all n £ N. Consequently, if ip satisfies Assumption W(oo), for all n > 0, 
for all a £ N*, there exists C(n) > not depending on a such that 
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Now, by choosing p such that 1 — p < D — D', the inequality (J6j) is obtained. □ 



Proof [Property [5] Using the proof of previous Property [TJ with Assumption W(5/2), V is included in a 
Sobolev space W(5/2, L), inequality (|19[) is checked with /3 = 5/2 and (j2"0)) is replaced by 

2 
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since sup ueR (l + w 3 / 2 )|^(u)| < oo. Therefore, inequality (|2"Tj) is replaced by 
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The end of the proof is similar to the end of the previous proof, but now Ki^ c \ exists for — 2 < c < 1 and 
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Finally, under Assumption AT, for all a 6 N*, since —2 < D — D' <1 

E(e 2 ( a ,0))-.T(0)A^ ra D 
which achieves the proof. □ 



Proof [Corollary [T] Both these proofs provide main arguments to establish j7]). For better readability , 

we will consider only Assumption Al' and Assumption W(po) (the long memory process being similar). The 
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The asymptotic behavior of ipi u ) when u — > oo {tp is considered to satisfy Assumption W(oo)), this behavior 
induces that 



r oo 

/(-) x \^{u)\ 2 du < Ca D / u~ D x $(u)\ 2 du < 
v/5 a J^a a 



C(n) 



(24) 



for all n E N. Moreover, 



/(-) \i)(u)\ 2 du = f*(0) 
a 





u 


-D 


u 


Fa 




+ c D , 






a 




a 



D'-D 



\ip{u)\ 2 du 



-/*(«)( 

VH v a v 



-D 



D'-D 



\ip{u)\ 2 du. (25) 



From computations of previous proofs, 







-D 


n 








+ c D , 




D'-D^ 




a 




a 





\\il}(u)Y du = K^, D ) ■ a u + CD'Kty iD - D i) ■ a D D +A(o), 



(26) 



21 
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with for all u G [—\/a, y/a], g(u,a) — > when a — > oo. Therefore, from Lebesgue Theorem (checked from the 
asymptotic behavior of tjj) , 
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As a consequence, from 
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([2"5]1. (12^)) and ([2"7j). the corollary is proven. □ 
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Proof [Proposition [T] This proof can be decomposed into three steps :Step 1, Step 2 and Step 3. 
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Step 1. In this part, — • Cov(Tjv(riaAr), TN(rjaN))i<i j<e is proven to converge at an asymptotic covariancc 
a N 

matrix T. First, for all (i, j) &{!,... ,i} 2 , 
Cov(f N (r l a N ),f N (r j a N j) = 2- 



[N/na N ] [N/ rj a N ] 



t Y Y (Cov(e(riaN,p),e(r j a N ,q)) , (28) 

ria N \ ^— ' V / 



[N/na N ] [N/r 3 a N ] 

because A is a Gaussian process. Therefore, by considering only i = j and p — q, for TV and a at large enough, 

1 N 



Cov(T N {naN),T N (riaN)) > 



n a N 



(29) 



Now, for (p, q)e{l,..., [N/na N }} x {1, . . . , [N/na N ]}, 



Cov(e{r i a N ,p),e(r j a N ,q)) =■ 



f*{0)K/^^ D ) naN TjCLN - 1 ~ ' naN rjdN 



Y Y '4>{^—)'4 1 {-^—)r(k - k' + a N (r,ip - rjq)) 



1-D 
1 N 



(nrj^-vV 2 1 1 



ridN r j a N 



k 



A-' 



/*(0)i^(^ i£) ) rjOAr r^ajv ^ na N K r j a N / J_ 7T 
(r^Y 1 -^/ 2 1 1 



} -i\(k-k / +a N (rip-rjq)) 



VidN r j a N 



k' 



a N f*{^) K (4>,D) r i a N TjQN j~ naN rja N 



^ ^- ^ ^T-H.At' T'fl.AT I AT 



-+r i p-r j q) 



Using the same expansion as in (f2Tj) , under Assumption W^oo) the previous equality becomes, for all ?i £ N* 
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with C(n), C'(n), C"(n) > not depending on a at and due the asymptotic behaviors of ip(u) when u — ► and 
u — > oo. Now, under Assumption Al, 
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Hence, with (|28j) . 
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[N/ria N ] [N/rja N 



< c- 



-_ y y ( ?— 

-j-ajv] ^ ^-f V(l+ np-r^ 

p— 1 g — 1 x 1 J - 



[N/na N ] [N/r 

But, from the theorem of comparison between sums and integrals, 

[N/na N ] [N/ rj a N ] 

E E ( 1 + \np-r j q\)- 2 < 

p=l q=l 

< 



< 



TV 1 TV 
As a consequence, if a/v is such that limsup ^Tv < 00 th en limsup — 

N— oo aN CL N TV— oo 

N 1 

More precisely, since this covariance is a sum of positive terms, if limsup 2757 

jv^oo aN a N 



,2D' 
N 





pN/a N 


/■iV/ojy 


nrj 


10 


Jo (1 


2 


pN/a N 


iV/ a^v 


nrj 


10 


(1 + w) 2 


2 


TV 






a N 





Cov(f N (ria N ), f N (rja N )) 



lim 

N— >oo ajv 



N f ~ ~ \ 

— (Cov(SN(riaN),SN(rjaN))) =T(n,--- ,ri,ij),D), 



< oo. 



(36) 



a non null (from (|29[) ) symmetric matrix with r(r*i, • • • ,r£,ip,D) = (~/ij)i<i.j<e that can be specified. Indeed, 

TV 1 



from the previous computations, if limsup 



2D' 



0, 



Srr .„„ Wna N ] [N/r 3 a N ] ){1 _ D)/2 

ii m 8?t? j QjV 



jv-> oo TV 



A', 



1 i)(uri)ip(urj) 

d,U — COS 



,D 



(u(np - rjq)fj 



8(n rj y- u a N 



[JV/dyOjv]-! 



E ( 

W.-D) m=-[JV/d ij a lv ] + l 



TV 



dijdN 



du p — cos (it dij-m) 



K 2 d 



E 



cos(udijm) du 



with = GCDifi ; r^). Therefore, the matrix T depends only on on r\, ■ ■ ■ , ri, ip, D. 



Step 2.Generaly speaking, the above result is not sufficient to obtain the central limit theorem, 



T N {ria N ) - E(e (na N ,0) 



7Vi(0,r(ri,--- ,rt,4>,D)). 



(37) 



JV- 



However, each Twin^N) is a quadratic form of a Gaussian process. Mutatis mutandis, it is exactly the same 

framework (i.e. a Lindcberg central limit theorem) as that of Proposition 2.1 in Bardet (2000), and (|3~?| is 

TV 

checked. Moreover, if (a n ) n is such that limsup 1+2£) , = then using the asymptotic behavior of E(e 2 (riajv, 0) 



N — >oo a 



N 



provided in Property [T] 



As a consequence, under those assumptions 

\ —(f N (r t a N ) - 1 
V &/v \ 



N- 



i<i<e 

N—>oo 



A/i(0,r(ri,- • • ,re,ip,D)). 



(38) 
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Step 3. The logarithm function (xi,..,xe) <G (0, +oo) e i— ► (loga;i, .., logx m ) is C 2 on (0, +oo) e . As a conse- 
quence, using the Delta-method, the central limit theorem (fT0| for the vector ( logTjv'(Fiajv') ) follows 
with the same asymptotical covariance matrix T(ri, ■ ■ ■ ,ri,ijj, D) (because the Jacobian matrix of the function 
in (!,..,!) is the identity matrix). □ 



Proof [Proposition [2] There is a perfect identity between this proof and that of Proposition [TJ both of 
which are based on the approximations of Fourier transforms provided in the proof of Property [2] □ 

Proof [Corollary [3] It is clear that X' t = X t + P m (t) for all t € Z, with X = (X t )t satisfying Proposi- 
tion Q] and [21 But, any wavelet coefficient of (P m (t)) t is obviously null from the assumption on ip. Therefore 
the statistic Tjv is the same for X and X'. □ 

Proof [Proposition [5] Let e > be a fixed positive real number, such that a* + e < 1. 



I. First, a bound of Pr(aAr < a* + e) is provided. Indeed, 



Pr(a N <a*+e) > Pr (Q N (a* + s/2) < 



mm 

a>a'+e and aeAi 



QaK")) 



> 1-Pr( (J Q N (a* + s/2) > Qjv(a)) 

a>a*+e and a£A N 

io g [N/e\ 

> 1- Pr(Q w (a*+e/2)>Q w ( 

*;=[(«» +e) log iV] 



logiV' 



But, for a > a* + 1, 

Pr (q n {<x* +s/2)>Q N {a) 



= Pr 



P N (a* + e/2) • r w (a* + s/2) > P N (a) ■ Y N (a) 



(39) 



with Ppf(a) = In — Apf(a) ■ (A^(a) • Apf(a)) 1 • A^{a) for all a € (0, 1), i.e. P/v(a) is the matrix of an 
orthogonal projection on the orthogonal subspace (in B^) generated by An{ol) (and Ii is the identity matrix 
in M. e ). From the expression of A^(a), it is obvious that for all a £ (0, 1), 

P N {a) = P = I t -A-(A' -A)" 1 -A, 
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/ 



log(ri) 



with the matrix A = 

\ log(r £ ) 1 J 

Pr (Q N {a* + e/2) > Qjv(a)) = Pr 

= Pr 



as in Proposition [3] Thereby, 



P-Y N (a* +e/2) > P-Y^a) 

2 



P- 



iV 



Fiv (a* + e/2) 



> 



jya-(a*+ E /2) 



/ N 2 



< Pr (y N (a* + e/2) > N fr-(.«'+e/3))/A + Pr ^( a ) < ^-(a-Ca^/a))^ 



/AT 2 

with Vat (a) = P • y Yjv(a:) for all a G (0, 1). From Proposition [TJ for all a > a*, the asymptotic law 

w 



of P • y Y/v(aO is a Gaussian law with covariance matrix P T ■ P 1 . Moreover, the rank of the matrix is 
P ■ T • P' is I — 2 (this is the rank of P) and we have 

< A_, not depending on N) such that P -T ■ P' — A_P ■ P' is a non-negative matrix (0 < A_ < min{A G 
Sp(r)}). As a consequence, for a large enough N, 



Pr 



(y N (a) < N -(<*-{°>'+e/W^ < 2 • Pr (V_ < at-(«-(«*+^/ 2 ))/ 2 ) 



< 



2<V 2 " 2 r(£/2) Va 



AT\-(|-i) 



(o-(a*+£/2)) 



with V_ ~ A_ • x 2 (i" — 2). Moreover, from Markov inequality, 



Pr 



(y N (a* + e/2) > Ar(«-K+e/ 2 ))M < 2 ■ Pr (expC^) > cxp (jv(»-(«'+=/a))/4) ) 

< 2 • E(exp( v /V^)) • cxp ( - at(«-(«*+^/ 2 ))/4) 

with V+ ~ A + • x 2 (i' — 2) and A + > max{A G Sp(r)} > 0. Like E(exp(y / T 7 f )) < oo does not depend on N, we 
obtain that Mi > not depending on N, such that for large enough N, 



Pr ( Q N {a* + e/2) > Q N (a)) < Mj • AH!" 1 



, ( t »-(o*+e/2)) 



and therefore, the inequality ([55]) becomes, for AT large enough, 
Py(a N <a*+e) > 1 - Mi 

fc=[(a*+e) k 

> 1 — Mi ■ log A" ■ N~^~^~ 



u>g[N/e\ _ v _ 2) 

E N ~ 4 

fe=[(a*+e)log JV] 



((l5^)-(«*+=/2)) 



(40) 



II. Secondly, a bound of Pr(a;7v > a* — e) is provided. Following the above arguments and notations 



Pr(SAT>a*-e) > Pr [Q N (a 



I - a* 

» e) < min QAr(a) 

a<a*~e and aeyijv 



[( Q *-£)l0gW] + l 1- * 

> 1- P*(QN(a* + ^^e)>Q 

k=2 



AM 



log N V 



(41) 
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and as above 
Pr 



(- I -a* 

[Qn{o* + — g) > Qjv(a) 



2a* 



= Pr 



P- 



iV 



p -y^ Y ^ a f)- (42) 



Now, in the case oat = N a with a < a* , the sample variance of wavelet coefficients is biased. In this case 
from the relation of Corollary [T] under Assumption AT, 



Y N (a) 



C °' K ^-°')) iiNa) -n> {1 + 0i{1))] 



i<i<e \ f*(0)Kty tD ) 



i<i<e 



en {a) 



Ki<t 



with Oj(l) — ► when N — > oo for all i and E(Zjv(a)) = 0. As a consequence, for large enough N, 



/ N 2 



P • e N (a) 



■N- 



P 



C ^ K D - D ' ]) i- D \l + o i{ l)) 



Ki<i 



with D > 0, because the vector (i D )i<i<e is not in the orthogonal subspace of the subspace generated by 
the matrix A. Then, the relation (H2I) becomes. 



1 - a* ~ 
Py[Q N (a* + —— £ )>Q N (a)) < Pr 



P- 



N 



1 - a* 



, Y N (a* + ^—e) 



> D ■ N a - {a * +1 ^ e) ■ N 



< Pr (v+>D- TV^^C 2 ^*-")-^ 

< M 2 • AT-d" 1 )^ 2 , 



1 - a 



1-a* 



2a* 



with Mi > 0, because V + ~ A + ■ x - 2) and - - (2(a* — a) — e) > e for all a < a* - e. Hence 

from the inequality (|4ip . for large enough TV , 

Pr (a N > a* - e) > 1 - M 2 ■ log TV ■ N"^' 1 ^ 

The inequalities and flU) imply that Pr (|Sjv - £*| > e) — ► 0. □ 

N—>co 



(43) 



Proof [Theorem [T] The central limit theorem of (HHJ) can be established from the following arguments. First, 
Pr(aAr > a*) — ► 1. Following the previous proof, there is for all e > 0, 

N— >oo 

Pr (a N > a* - e) > 1 - M 2 ■ log iV ■ A^" 1 )^ 5 . 

log log A" 2 

Consequently, if e^r = A ■ — — — with A > — — then, 

log AT (£ — 2)D' 

y (l-2)-P' log log JV 

Pr(Sjv > a* -£ W ) > 1 — Mi • log A 7 " • A^~ 



> l-A/ 2 - (logJV) 

Pr (Sat + e^r > a*) — ► 1. 

AT— >oo 
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Now, from Corollary @J f? N D' . Therefore, Pr(D' N < \d') — > 1. Thus, with A > 



3 ' AT ' - A(£- 2)D r ' 

Pr (ctN + (sn — ■ — r — > <**) — ► 1 which implies Prfa^r > a*) — > 1. 

Secondly, for x €r, 



lim Pr (J-p^(D N - D) < x) = lim Pr f J —^(D N - D) < x(~]a N > 



+ ^ pr ( v {Dn D) - x n ^ a *) 

lirn^ y ^ Pr (\[§^{D N -D)<x) fs N (a) da 

lim Pr(Zr<x)- / / 3jv (a)da 
r-«x> V / J a . 



N 

Pr ( Z r < 



with /ajv(a) the probability density function of Sat and Z T ~ A/"(0 ; (A' • A) -1 ■ A/ • T • A • (A' ■ Ay 1 ) 



To prove the second part of (|16[) . we infer deduces from above that 

„ / * ^ ~ ^ * , 3 loglogiV loglog, 

Pr a < a/v < a -\ — — • h u ■ 

V (£-2)D' N logN p logiV 

with > Therefore, v < ( e _2) D , + yz^, 

Pr (V Q * < N &N < N a ' ■ (logNy^ — > 1. 

TV— >oo 

This inequality and the previous central limit theorem result in : for all p > v/2, and e > 0, 



_/Vi+2D' - \ / J\fi( S K~a') j 



Pr ( , 1+2 ° ■ |Djv - D\> e] = Pr ( ; \l \D N - D\ > e 

\ (log N)p 1 1 J V (logJV)P V N a » 



— > 0. □ 
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N = 10 3 



N = 10 4 



N = 10 5 





Va/sb 


i = 5 


I = 10 


I = 15 


e = 20 


t = 25 


1 


£i = 15 

e 2 = e 


fGn (H = -2±i) 


Sn, S N 


0.16, 0.75 
0.12, 0.32 


0.14, 0.19 
0.07, 0.13 


0.13, 0.17 
0.05, 0.08 


0.14, 0.15 

0.04, 0.05 


0.14, 0.15 
0.04, 0.04 


0.15, 0.18 
0.05, 0.08 


FARIMA(0, -J,0) 


5 N , b N 

S N , a N 


0.21, 0.81 
0.14, 0.34 


0.15, 0.20 
0.07, 0.13 


0.14, 0.17 
0.05, 0.09 


0.15, 0.15 

0.05, 0.06 


0.15, 0.15 
0.04, 0.04 


0.15, 0.19 
0.05, 0.09 


FARIMA(1, -S , 0) 


D N , D N 
S N , a N 


0.30, 0.96 
0.19, 0.44 


0.28, 0.35 
0.15, 0.24 


0.27, 0.29 

0.12, 0.17 


0.29, 0.27 

0.11, 0.15 


0.30, 0.30 
0.11, 0.12 


0.31, 0.35 
0.12, 0.17 


FARIMA(1, -& , 1) 


B„, b N 

S N , S N 


0.60, 0.92 
0.17, 0.38 


0.43, 0.41 
0.11, 0.18 


0.39, 0.35 
0.09, 0.12 


0.36, 0.35 
0.07, 0.09 


0.32, 0.33 
0.06, 0.07 


0.21, 0.20 

0.09, 0.12 


X {D - D ">, D' = 1 


£> N , b N 

Sn, S N 


0.33, 0.68 
0.10, 0.22 


0.29, 0.28 
0.10, 0.07 


0.27, 0.26 
0.11, 0.07 


0.26, 0.27 
0.12, 0.12 


0.25, 0.25 

0.13, 0.13 


0.29, 0.30 
0.11, 0.07 




•SMSE 


I = 5 


t = 10 


I = 15 


i = 20 


I = 25 


1 


ix = 15 

t 3 = t 


fGn (H = -2±i) 


Dn, Dn 

S N , S N 


0.08, 0.26 
0.08, 0.22 


0.05, 0.05 
0.05, 0.06 


0.05, 0.05 
0.04, 0.05 


0.04, 0.04 
0.04, 0.05 


0.04, 0.04 

0.05, 0.05 


0.04, 0.04 
0.04, 0.05 


FARIMA(0, f ,0) 


Dn, D N 
Sjv, a N 


0.08, 0.31 
0.09, 0.24 


0.06, 0.06 
0.05, 0.07 


0.05, 0.05 
0.04, 0.05 


0.05, 0.05 
0.04, 0.05 


0.05, 0.05 

0.05, 0.05 


0.05, 0.05 
0.04, 0.05 


FARIMA(1, -S , 0) 


Dn, D n 
Sn, ct N 


0.13, 0.57 
0.15, 0.36 


0.10, 0.10 
0.09, 0.16 


0.09, 0.08 

0.08, 0.11 


0.09, 0.08 

0.07, 0.09 


0.09, 0.09 
0.06, 0.08 


0.09, 0.08 

0.08, 0.11 


FARIMA(1, S ) i) 


D N , Dn 

CtN, CtN 


0.22, 0.63 
0.16, 0.38 


0.17, 0.15 
0.11, 0.17 


0.16, 0.13 
0.08, 0.11 


0.15, 0.14 
0.07, 0.09 


0.15, 0.14 
0.06, 0.07 


0.09 , 0.09 

0.08, 0.11 


A <D ' D '', D' = 1 


Dn, Dn 
S n , S n 


0.23, 0.36 
0.10, 0.18 


0.19, 0.15 
0.12, 0.08 


0.18, 0.17 
0.13, 0.12 


0.17, 0.17 
0.14, 0.14 


0.15, 0.14 

0.15, 0.15 


0.15, 0.14 

0.13, 0.12 




VMSE 


1 = 5 


£ = 10 


I = 15 


I = 20 


I = 25 




f e 1 = i5 


fGn (ff = -2±i) 


Dn, D N 
S N , S n 


0.04, 0.09 
0.07, 0.16 


0.03, 0.03 
0.06, 0.04 


0.02, 0.03 
0.06, 0.06 


0.02, 0.02 

0.07, 0.07 


0.02, 0.02 

0.07, 0.07 


0.02, 0.02 

0.06, 0.06 


FARIMA(0, ,0) 


D n , D n 
Sn, S n 


0.03, 0.13 
0.07, 0.18 


0.02, 0.02 

0.04, 0.05 


0.02, 0.02 
0.04, 0.03 


0.02, 0.02 

0.04, 0.04 


0.02, 0.02 

0.05, 0.05 


0.02, 0.02 
0.04, 0.03 


FARIMA(1, £,0) 


D n , D n 
Sn, S n 


0.05, 0.25 
0.12, 0.30 


0.05, 0.04 
0.07, 0.12 


0.04, 0.03 
0.05, 0.07 


0.04, 0.03 
0.04, 0.06 


0.04, 0.04 
0.04, 0.05 


0.03, 0.02 

0.05, 0.07 


FARIMA(1, §, 1) 


Bn, d n 
Sn, S n 


0.08, 0.30 
0.13, 0.33 


0.06, 0.04 
0.09, 0.15 


0.05, 0.04 
0.08, 0.11 


0.05, 0.04 
0.07, 0.09 


0.05, 0.05 
0.06, 0.08 


0.04, 0.03 

0.08, 0.11 


X< D ' D '>, D' = 1 


£>n, d n 

Sn, S n 


0.13, 0.19 
0.09, 0.15 


0.11, 0.08 
0.10, 0.07 


0.10, 0.08 
0.11, 0.09 


0.09, 0.09 
0.12, 0.11 


0.09, 0.09 
0.13, 0.13 


0.08, 0.07 

0.11, 0.09 



Tabic 1: Consistency of estimators Dn , Dn , un , ckn following I from simulations of the different long-memory 
processes of the benchmark. For each value of N (10 3 , 10 4 and 10 5 ), of D (0.1, 0.3, 0.5, 0.7 and 0.9) and I 
(5, 10, 15, 20, 25 and {lb,t)), 100 independent samples of each process are generated. The v MSE of each 
estimator is obtained from a mean of s/ MSE obtained for the different values of D. 
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FARIMA(0, -0.25, 0) 


x (-i,D 


X (-l,3) 


X (-3,U 


X C-3,3) 


N = 10 3 


VMSE Dn, d n 


0.15, 0.20 


0.30, 0.30 


0.38, 0.37 


0.36, 0.37 


0.39, 0.38 


N = 10 4 


VMSE D N , D N 


0.04, 0.04 


0.15, 0.14 


0.08, 0.08 


0.13, 0.14 


0.13, 0.13 


JV = 10 5 


VMSE D n , D n 


0.03, 0.03 


0.06, 0.05 


0.04, 0.03 


0.04, 0.04 


0.03, 0.03 



Tabic 2: Estimation of the memory parameter from 100 independent samples in case of short memory (D < 







PI 


PI 


P3 


P4 


N 


10 3 


y/MSE D N , D N 


0.22, 0.23 


0.32, 0.41 


0.47, 0.76 


0.40, 0.41 


X 


10 4 


\/MSE D N , D N 


0.06, 0.06 


0.18, 0.28 


0.24, 0.65 


0.13, 0.13 


\ = 


10 5 


VMSE Bjv, D N 


0.02, 0.02 


0.02, 0.02 


0.14, 0.47 


0.03, 0.04 



Table 3: Estimation of the long-memory parameter from 100 independent samples in case of processes PI 
defined above. 
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D = 0.1 


D = 0.3 


D = 0.5 


D = 0.7 


D = 0.9 


fGn (H = (D + l)/2) 


Dbgk 


0.089 


0.171 


0.259 


0.341 


0.369 




Dgrs 


0.114 


0.132 


0.147 


0.155 


0.175 




Dms 


0.163 


0.169 


0.181 


0.195 


0.191 




Dr 


0.211 


0.220 


0.215 


0.218 


0.128 




Datv 


0.176 


0.153 


0.156 


0.164 


0.162 




Dn 


0.139 


0.147 


0.133 


0.140 


0.150 


FARIMA(0, , 0) 


Dbgk 


0.094 


0.138 


0.239 


0.326 


0.413 




Dgrs 


0.131 


0.139 


0.150 


0.150 


0.162 




Dms 


0.172 


0.167 


0.174 


0.197 


0.188 




Dr 


0.246 


0.189 


0.223 


0.234 


0.181 




Datv 


0.128 


0.107 


0.081 


0.074 


0.065 




Dn 


0.161 


0.146 


0.149 


0.149 


0.161 


FARIMA(1, -2 . 0) 


Dbgk 


0.146 


0.203 


0.239 


0.236 


0.212 




Dgrs 


0.519 


0.545 


0.588 


0.585 


0.830 




Dms 


0.235 


0.258 


0.256 


0.252 


0.249 




Dr 


0.242 


0.241 


0.234 


0.202 


0.144 




Datv 


0.248 


0.267 


0.280 


0.268 


0.375 




Dn 


0.340 


0.319 


0.314 


0.315 


0.334 


FARIMA(1, 4p , 1) 


Dbgk 


0.204 


0.253 


0.342 


0.363 


0.384 




Dgrs 


0.901 


0.894 


0.866 


0.870 


0.893 




Dms 


0.181 


0.175 


0.180 


0.185 


0.181 




Dr 


0.204 


0.200 


0.200 


0.191 


0.130 




Datv 


0.392 


0.380 


0.371 


0.343 


0.355 




Dn 


0.170 


0.218 


0.225 


0.226 


0.213 




Dbgk 


0.090 


0.139 


0.261 


0.328 


0.388 




Dgrs 


0.342 


0.339 


0.331 


0.300 


0.315 




Dms 


0.176 


0.178 


0.182 


0.166 


0.177 




Dr 


0.219 


0.232 


0.231 


0.173 


0.167 




Datv 


0.153 


0.161 


0.168 


0.176 


0.176 




Dn 


0.284 


0.294 


0.293 


0.292 


0.288 
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N = 10 4 ► 







D = 0.1 


D = 0.3 


D = 0.5 


D = 0.7 


D = 0.9 


fGn (H = (D + l)/2) 


D BGK 


0.062 


0.143 


0.182 


0.171 


0.182 




Dq b.S 


0.040 


0.047 


0.054 


0.068 


0.066 




D m s 


0.069 


0.064 


0.061 


0.071 


0.063 




Dr 


0.063 


0.055 


0.058 


0.063 


0.052 




Datv 


0.036 


0.042 


0.041 


0.047 


0.045 




D N 


0.050 


0.040 


0.041 


0.039 


0.040 


FARIMA(0, , 0) 


D BGK 


0.059 


0.141 


0.195 


0.187 


0.178 




Dgrs 


0.042 


0.048 


0.050 


0.046 


0.057 






0.072 


0.055 


0.066 


0.059 


0.065 




Dr 


0.073 


0.053 


0.064 


0.057 


0.059 




Datv 


0.026 


0.038 


0.039 


0.032 


0.022 




£>N 


0.053 


0.050 


0.056 


0.055 


0.044 


FARIMA(1, -J , 0) 


D BGK 


0.085 


0.148 


0.146 


0.164 


0.120 




Dors 


0.179 


0.175 


0.182 


0.192 


0.190 






0.109 


0.105 


0.099 


0.100 


0.094 




Dr 


0.063 


0.059 


0.057 


0.054 


0.054 




Datv 


0.118 


0.101 


0.088 


0.120 


0.081 




Dn 


0.095 


0.085 


0.093 


0.081 


0.097 


FARIMA(1, -S , 1) 


D BGK 


0.111 


0.201 


0.189 


0.202 


0.181 




&GRS 


0.308 


0.321 


0.306 


0.314 


0.311 




Dais 


0.070 


0.064 


0.065 


0.064 


0.069 




Dr 


0.063 


0.057 


0.060 


0.064 


0.052 




Datv 


0.114 


0.118 


0.103 


0.102 


0.093 




Dn 


0.095 


0.099 


0.087 


0.101 


0.090 


X<- D ' D ">, D' = 1 


Dbgk 


0.069 


0.110 


0.204 


0.190 


0.197 




Dors 


0.192 


0.185 


0.172 


0.177 


0.190 




Dms 


0.083 


0.059 


0.071 


0.066 


0.068 




Dr 


0.066 


0.057 


0.068 


0.054 


0.064 




Datv 


0.124 


0.131 


0.139 


0.147 


0.153 




Bn 


0.158 


0.143 


0.152 


0.158 


0.155 



Table 4: Comparison of the different log-memory parameter estimators for processes of the benchmark. For 
each process and value of D and N , V ' MSE are computed from 100 independent generated samples. 
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